Binary text classification using genetic programming with crossover-based oversampling for imbalanced datasets
نویسندگان
چکیده
It is well known that classifiers trained using imbalanced datasets usually have a bias toward the majority class. In this context, classification models can present high performance overall and for class, even when minority class significantly lower. This paper presents genetic programming (GP) model with crossover-based oversampling technique dataset binary text classification. The aim of study to apply an solve issue improve GP employed proposed technique. employs crossover operator generating new samples in dataset. By combination GP, was improved. shown outperforms all applications use original without resampling. Moreover, system surpassed approaches synthetic (SMOTE) random oversampling. Further comparison state-of-the-art on five terms F1-score shows superior approach.
منابع مشابه
Generative Oversampling for Mining Imbalanced Datasets
One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, desired class distribution in the training data is achieved. Many resampling techniques have been proposed in the past, and the relationship between resampling and cost-sensitive learning has been well studied. Surprisingly...
متن کاملOversampling Method for Imbalanced Classification
Classification problem for imbalanced datasets is pervasive in a lot of data mining domains. Imbalanced classification has been a hot topic in the academic community. From data level to algorithm level, a lot of solutions have been proposed to tackle the problems resulted from imbalanced datasets. SMOTE is the most popular data-level method and a lot of derivations based on it are developed to ...
متن کاملAdaptive Oversampling for Imbalanced Data Classification
Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, VIRTUAL, that comb...
متن کاملUsing Self-organizing Maps for Binary Classification with Highly Imbalanced Datasets
Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importan...
متن کاملClassification in Imbalanced Datasets
In this thesis we study the classification task in the presence of class imbalanced data. This task arises in many applications when we are interested in the under-represented (minority) classes. Examples of such applications are related to fraud detection, medical diagnosis and monitoring, text categorization, risk management, information retrieval and filtering. Although there exist many stan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Turkish Journal of Electrical Engineering and Computer Sciences
سال: 2023
ISSN: ['1300-0632', '1303-6203']
DOI: https://doi.org/10.55730/1300-0632.3978